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PROPAGATION OF VIRUSES THROUGH AN INFORMATION 
TECHNOLOGY NETWORK 

The present invention relates to the propagation of viruses through a network of 
5 interconnected processing entities. 

In current network environments virtually any processing entity (or "host") is at one 
time or another connected to one or more other hosts. Thus for example in the case of 
an IT environment, a host in the form of a computer (such as a client, a server, a 
10 router, or even a printer for example) is frequently connected to one or more other 
computers, whether within an intranet of a commercial organisation, or as part of the 
Internet. Alternatively, in the case of a communications technology environment, a 
host in the form of a mobile telephone is, merely by virtue of its intrinsic purpose, 
going to be connected to one or more other hosts from time to time, and an inevitable 
15 result is that the opportunities for the propagation of viruses are enhanced as a result. 
For example in the case of a computer virus known as the "Code Red" virus, once 
assimilated within a host the virus operates to generate Internet Protocol ("IP") 
addresses of other potential hosts at random, and then instructs the host to send a copy 
of the virus to each of these randomly-generated IP addresses. Although not all of the 
20 potential hosts are genuine (since the IP addresses are randomly generated), sufficient 
of the randomly generated addresses are real addresses of further hosts to enable the 
virus to self propagate rapidly through the Internet, and as a result to cause a 
substantial drop in performance of many commercial enterprise's computing 
infrastructure. 

25 

Within the context of this specification a virus is data which is assimilable by a host to 
cause a deleterious effect upon the performance of either: the aforesaid host; one or 
more other hosts; or a network of which any of the above-mentioned hosts are a part. 
Thus for example, a virus may act by becoming assimilated within a first host, and 
30 subsequent to its assimilation may then cause deleterious effects within that first host, 
such as corruption and/or deletion of files. In addition the virus may cause self- 
propagation to one or more further hosts at which it will then cause similar 
corruption/deletion and further self-propagation. Alternatively the virus may merely 
be assimilated within the first host and cause no deleterious effects whatsoever, until 
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communications established by one or more hosts with other hosts within the network 
is monitored to determine whether the or each monitored host is infected, and upon 
detection of an infection, data by which communications from the or each monitored 
host are established with, and possibly additional data besides, are recorded. 
5 Depending upon the precise nature of the data recorded, it may then be used to 
establish inter alia : the nature of the virus, the manner in which it propagates. 

Embodiments of the invention will now be described, by way of example, and with 
reference to the accompanying drawings, in which: 

10 

Fig. 1 is a schematic representation of one form of network architecture; 

Fig. 2 is a schematic illustration of the conventional operational architecture of a 
computing entity forming a part of, for example, the network of Fig. 1; 

15 

Fig. 3 is a schematic illustration of establishment of a connection in accordance with 
an application protocol from Fig. 2; 

Fig. 4 is a schematic illustration of data transmission in accordance with a further 
20 application protocol from Fig. 2 

Fig. 5 is a schematic illustration of an operational architecture according to an 
embodiment of the present invention of a computing entity forming a part of a 
network; 

25 

Fig. 6 is a graphical representation of the operation of an embodiment of method 
according to the present invention; 

Fig. 7 is a flowchart illustrating the operation of the method of Figs. 6 

30 

Figs. 8 A and B are flowcharts illustrating further aspects of embodiments of method 
according to the present invention. 
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above, and in conjunction with Internet Protocol (IP). Finally, the network stack 400 
includes a system programme known as a driver 410 for the network card, which in 
essence is low level software that controls the network card. 

In the present illustrated examples, the process of establishing a connection in 
accordance with HTTP will be considered. Usually a request for such a connection is 
made by the web browser application programme, and this in turn is most likely to be 
at the behest of a user operating the web browser. Where this is the case, the request 
will identify the address or '"URL" within the network of the computing entity with 
which a connection is sought, initially using alphanumeric characters entered at the 
address bar of the browser application programme (for example http://www.hp.comV 
Ultimately however these are Resolved" into a numerical "IP address" of the form: 
xxx.xxx.xxx.xxx, where x is either an integer or no number at all and each set of three 
X's is an integer no greater than 255. An example of an IP address is 192.168.2.2. 
The IP address is subsequently further resolved into what is known as a physical, or 
Media Access Control ("MAC") address of the network card of the destination 
computing entity. Resolution of the URL into an IP address, and the IP address to a 
MAC address usually takes place at dedicated computing entities within the network, 
in a manner which is well known per se, and will not be described further herein. 
This description of the connection process in accordance with HTTP, well known per 
se 9 has described connections legitimately requested by a user, and by means of a 
URL. However it should be appreciated that it is possible for example to request a 
connection from the web browser application programme using an IP address, rather 
than the alphanumeric characters of the URL. This is an aspect of the system 
behaviour which has been exploited by viruses, some of which randomly generate IP 
addresses in accordance with the rules governing their allowable format, and then 
seek connection to those randomly generated addresses. 

In the context of the present application it should be appreciated that the term 
"connection" is a term of art, and is used to refer to a manner of transmitting 
messages in which acknowledgement of receipt of data is required, so that in the 
absence of an acknowledgement the connection is deemed either not to have been 
established, or to have failed, and the transmitted message deemed not to have 
arrived. On application protocol which operates using connections is HTTP, and an 
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connections using the same protocol this may be used to distinguish one such 
connection from the other; a flag indicating that the synchronisation status of the 
requesting entity is set to "on" (meaning that sequence numbers - which indicate the 
order of the packet in a total number of packets sent - between the requesting and 
5 destination computing entity are to be synchronised), and an initial sequence number 
50 (this could be any number). Upon receipt of this packet, the destination machine 
sends back a packet P20 identifying the source port as 80, the destination port as 
3167, a flag indicating that the acknowledgement status is "on", an acknowledgement 
number 51 which augments the sequence number by one, and its own synchronisation 

10 flag number 200. When the requesting entity receives this packet it returns a further 
packet P30 once again identifying the source and destination ports, and a flag 
indicating that its acknowledgement status is on, with an acknowledgement number 
201 (i.e. which augments the sequence number by one). Once this exchange is 
complete, a connection between the client and server entities is defined as being open, 

15 and both the client and server entities send messages up through their respective 

network stacks to the relevant application programmes indicating that a connection is 
open between them. In connection with the socket, it should also be noted that the 
socket comprises an area 460 allocated to store the actual body of the message which 
it is desired to transmit (sometimes known as the outbound message content, or the 

20 outgoing payload), and similarly a further area 470 allocated to store the body of 
messages which are received (inbound message content, or incoming payload). 

When the outgoing payload is to be transmitted, the TCP layer breaks it up into 
packets (i.e. data structures such as those illustrated above in Fig. 3, but further 
25 including at least part of the payload), and the IP layer attaches an IP address header. 
When an incoming message arrives, it passes up through the network stack, i.e. from 
the network card 340, up through the Internet Protocol software, etc., and is written in 
to the relevant socket (as identified, inter alia from the port number), from which the 
application programme retrieves the incoming payload. 

30 

Data may alternatively be transmitted using the protocols RSTP/UDP/IP (indicating 
the hierarchy of protocols in the network stack adopted in conjunction with each other 
to transmit the data) which do not require a connection; the dispatching entity sends a 
packet to the destination entity, and does not require an acknowledgement of receipt. 
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destination hosts. The number of new hosts allowed per time window, and the value 
of N are determined on the basis of the policy, typically defined by a system 
administrator, and the policy is preferably formulated to take account of the nature of 
non virally-related network traffic. In this way, the VPMS operates to monitor the 
5 speed at which a virus resident on the host may propagate from that host to other hosts 
within the network. 

Referring to Fig. 6 A, over the course of a time window Tl, various applications 
programmes running on the workstation send requests via the VPMS to send data 

10 (whether by connection or otherwise) to other hosts within the network ("outbound 
requests")* the email application programme, which requests dispatch of an email 
message (having multiple addressees) to a mail server, Mail (Request A) using SMTP, 
the file management application programme requesting dispatch of a file recording a 
text document to another user (Request B) via FTP, and the web browser programme 

15 which requests connection, (typically via a Web Proxy server), W/Server in order to 
connect to a site using HTTP (Request C). In the present example, outbound requests 
to the VPMS from each of these hosts are requests to send data to an identified 
destination host, and are ultimately manifested by the dispatch of one or more data 
packets in accordance with the relevant application protocol. The term "request" is 

20 intended to be interpreted broadly to encompass any indication (usually from an 
application programme, although by no means necessarily) that contact with a 
destination host is sought, and for ease of terminology, the transmission of a request is 
to be interpreted as indicating that data is transmitted pursuant to a request to transmit 
such data. 

25 

The VPMS operates in accordance with a routine illustrated in Fig. 7, whose features 
will now be described in more detail in conjunction with Figs. 6A-C, although Fig. 7 
should be regarded as a generic illustration of the operation of the VPMS routine, 
rather than a specific illustration of individual event depicted in Figs. 6. As explained 
30 above, the VPMS operates with reference to a series of time intervals, or windows, 
which in the present example are of constant length. The routine is initiated at step 
702 by a clock (typically the clock which defines the time windows) indicating that a 
time window has commenced. At step 704 the routine then updates a dispatch record, 
which is a record of the identities of a predetermined number N (which in this 
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in the dispatch record 612 for this time window T2 is performed in order to establish 
whether the request is new. The dispatch record however is now a genuine record of 
the identities of the three hosts contacted most recently during the previous time 
window Tl (although coincidentally this is identical to the default dispatch record). 

5 Upon receipt of Request D, the VPMS routine for that request establishes at step 708 
that the identity of this host is not in the dispatch record 612, i.e. that it is a new 
destination host. It therefore proceeds to step 712, where it adds a copy of the 
Request D to a virtual buffer whose contents are shown in Fig. 6C, and then ends at 
710. In one preferred embodiment, the entire contents of the socket relating to 

10 Request D are duplicated in the virtual buffer. However in an alternative 

embodiment, where for example the payload is large, this is omitted. On receipt of 
Request B, the VPMS establishes at a step 708 that B is present in the dispatch record, 
and so the VPMS routine ends at step 710. Request E is also a new request within the 
time window T2 and so at a step 712 the identity of host E is added to the virtual 

15 buffer. 

Because receipt of requests are the trigger for the commencement of the routine 
illustrated in Fig. 7, neither the number of occasions in a given time window in which 
the VPMS routine is run, nor the timing of their commencement can be known in 

20 advance. Additionally, as illustrated in Fig. 7, it is possible for two (or indeed more, 
although only two are illustrated in Fig. 7) routines to be running in temporal overlap, 
since one may still be running when another is triggered by a further request. 
Similarly, a request may trigger the execution of the routine of Fig. 7 just prior to the 
end of a time window (a situation also illustrated in Fig. 7, with steps which occur at 

25 the end of a time window/the beginning of a subsequent time window being shown in 
dashed lines), so that the execution of the routine may overlap temporally with a part 
of the next time window. The approach taken by this embodiment of the present 
invention to this issue of overlap is relatively simple: if at the commencement of time 
window T„+i, the update of the dispatch record for a previous time window T n has 

30 been completed during the simultaneous running of a VPMS routine commenced in 
the previous time window T n > but prior to execution the step 712 (adding a request to 
the virtual buffer) for that routine, the subsequent update of the virtual buffer in that 
step 712 will be treated as if performed for a request received in the current time 
window T n+ i . This approach has the benefit of being simple, although it may on 
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example, following the transmission of the Request D, is one (i.e. the single Request 
E). Thus, in the same way that the dispatch record is a record of all requests which 
have been transmitted in accordance with policy, the virtual buffer is effectively a 
record of all requests which have been transmitted outside that policy. 

5 

The role of the virtual buffer is to enable a determination to be made with regard to 
whether the host upon which the VPMS is running is virally infected. One way in 
which this can be manifested is the size of the virtual buffer. A state of viral infection 
may therefore be defined in terms of the size of the buffer, and the stage of any such 

1 0 viral infection by the rate of change of the buffer size. This follows from the 

generally different behaviour of virally-related and non virally-related network traffic, 
in that non virally-related or "legitimate" network traffic usually involves contacting 
only a relatively small number of new destination hosts, whereas, because viruses 
tend to propagate by transmission to as many disparate destination hosts as possible, 

15 an instance of a large number of requests to contact a new destination host will 

typically be indicative of viral infection. The virtual buffer may be thought of as a 
queue of virtual new requests waiting for opportunities to be virtually transmitted in 
accordance with policy (since their "counterpart" real requests are simply transmitted 
without hindrance). The size of the virtual buffer is therefore one indication of 

20 whether there is viral infection, since a large buffer size is indicative of a large 

number of requests to contact a new host within a short space of time. An alternative 
indication of viral infection may be the existence of an increasing buffer size. 
Conversely, generally speaking a buffer size which is steadily declining from a 
relatively high value may be indicative of a temporary increase in legitimate traffic 

25 levels. It can be seen therefore that buffer size may be used to interpret the existence 
of viral infection with varying levels of complexity, the interpretation typically being 
something which is defined in the policy. 

A second buffer management routine, illustrated in Fig. 8B monitors the virtual 
30 buffer, and is triggered by performance of step 814 from the routine of Fig. 8 A, i.e. an 
update in the value of the variable LogNo, following which, at decision step 842, the 
routine determines whether the size of the buffer is greater than a quantity Vj, which 
the policy has determined represents viral infection, whereupon at step 844 it 
generates a virus alert. This may simply be a visual alert to a user of the workstation 

PDNO 200207546 



15 



substantial number of the requests transmitted during the course of time interval T4 
have been virally related. 

In the event that a viral warning is generated, various further actions may then be 
taken, the majority of which are directed toward finding out more about the nature of 
any possible virus. Specifically the type of information sought may typically include: 
the destinations to which a virus has been propagated, where applicable the 
application programme or programmes which it uses to propagate itself, and the 
action and behaviour of the virus. The nature of the information which may obtained 
directly from the virtual buffer, or which may be deduced therefrom depends to an 
extent upon the nature of the data stored in the virtual buffer, and the operating system 
of the host concerned. For example in the case of one preferred embodiment in which 
the virtual buffer simply copies the socket, including payload, the destination host will 
be recorded in the buffer, and possibly, in the case where the virus copies itself to the 
socket as the outgoing payload, also the virus. Additionally, where the operating 
system records an identifier in the socket denoting the application programme 
requesting the socket, and an ability to map this process identifier to the requesting 
application programme after the socket has been closed (remembering that the virtual 
buffer contains a copy of the socket, while the actual socket is transient since it is 
used to implement the request to send data and is then deleted), then the application 
programme responsible for requesting data transmission can be identified. The use of 
the data in a socket is only one way in which to collect data relating to possible viral 
infection, and when using sockets, depending upon the extent of the data collected, 
the of copying of the sockets is likely to vary. For example, if, as referenced above, 
the fullest data (including e.g. copies of the payload) is to be retained, further copies 
of the sockets in the virtual buffer (stored for example in a manner which tags them to 
the copy of the socket in the virtual buffer) are preferably made over time as the 
contents of the socket changes over time. However, because two functional elements 
within the host may cause a change in the data in a socket (e.g. the writing of 
outgoing data to a socket by an application programme, and removal from the socket 
of outgoing data by the network stack), maintaining a complete record may 
nevertheless still be difficult simply from observing the contents of sockets. 
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the mail server will in fact effectively allow the workstation to contact multiple other 
hosts within the network (i.e. the specified addressees) all of which may be new, even 
though, in accordance with the routine described in connection with Fig. 7, the 
outbound carrier request will only count as a single request which may not even be 

5 recognised as new if, as may be likely, the mail server is identified in the current 

dispatch record. In such a situation therefore, if the VPMS operates simply to record 
in the virtual buffer those new destination hosts to be contacted per time window on 
the basis only of those destination hosts which are ostensibly identified in the 
outbound request, the desired monitoring of viral propagation may be circumvented or 

1 0 reduced, because a single outb ound request specifying the mail server does n ot 
necessarily represent only a single email subsequently propagating through the 
network after processing and forwarding by the mail server. 

In a modification of the embodiment thus far described therefore, the VPMS includes 
15 within its routine a step of identifying the application programme by which an 

outbound request has been generated. Because certain applications programmes are 
more likely than others to use outbound carrier requests which invoke the use of a 
proxy (for example the above-mentioned instance of email, or the case of a web 
browser programme) it is possible in advance to specify criteria, based on the 
20 provenance of an outbound request, identifying those outbound requests likely to be 
carrier requests. If the packet is generated by one such specified application 
programme, then the VPMS invokes the use of the application protocol concerned to 
reveal the identities of the destination hosts specified in the sub-requests; here the 
eventual addressees for whom the email message is intended. Once the identities of 

25 the genuine or ultimate addressees have been obtained, there are several options for 
processing the request. In accordance with one alternative the identities of the 
destination hosts specified in the sub-request can be regulated in accordance with the 
same policy which applies to all other requests, and they can be matched against the 
host identities within the dispatch record in the manner previously described in the 

30 embodiment of Figs. 

Since in the case for example of email, the use of outbound carrier requests to a host 
acting as a proxy for the ultimate addressees of the email messages is the norm, it is, 
in a modification, possible for different versions of VPMS to run simultaneously, 
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CLAIMS 

1 . A method of monitoring propagation of viruses within a network of hosts 
comprising the steps of: 

establishing a record which is at least indicative of identities of hosts within 
the network to whom data has been sent by a first host ("destination hosts"); 

during a first time interval, comparing (a) identities of destination hosts 
identified in requests to send data from the first host and (b) identities of destination 
hosts identified in the record; 

transmitting all reque sts to send data; 

storing in a buffer data relating to requests which identify a destination host 
not in the record. 

2. A method according to claim 1 wherein the record is established by 
monitoring identities of destination hosts to whom requests have been transmitted 
during a second time interval, which precedes the first time interval. 

3. A method according to claim 2, wherein the record contains a predetermined 
maximum number of destination host identities, the maximum number being defined 
in accordance with a policy. 

4. A method according to claim 3, wherein the policy additionally defines a 
maximum number of destination host identities not in the record, to whom requests 
may be legitimately transmitted in accordance with policy. 

5. A method according to claim 4 further comprising the step, at the end of any 
given time interval, of deleting from the buffer data relating to requests transmitted 
during the given time interval in accordance with policy. 

6. A method according to claim 5 further comprising the step, at the end of the 
given time interval, of updating the record to reflect identities of hosts identified in 
requests which are transmitted in accordance with policy during the given time 
interval. 
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15. A method according to claim 14 wherein at the end of a time interval, socket 
data containing identities of destination hosts in respect of whom sockets have 
legitimately been created is deleted from the buffer. 

5 16. A method according to claim 10 further comprising the step, in the event that 
the number of socket data items stored in the buffer exceeds a predetermined value, of 
storing outgoing packets from the first host. 

17. A method according to claim 16 wherein packets having a designated 
1 0 destination IP address are stored. 

18. A method according to claim 17 further comprising the step of establishing the 
predetermined IP address from the socket data stored in the buffer. 

15 19. A method according to claim 1 0 further comprising the step, in the event that 
the number of socket data items stored in the buffer exceeds a predetermined value, of 
storing incoming packets to the first host. 

20. A method according to claim 19 wherein packets having a designated source 
20 IP address are stored. 

21 . A method according to claim 20, further comprising the step of establishing 
the predetermined IP address from the socket data stored in the buffer. 

25 



PDNO 200207546 



Is 



o 



O 




WEB 
BROWSER 





H 






T 






T 






P 




TCP 


IP 




N/W CARD 




DRIVER 



N/W CARD 



CLIENT ENTITY 



Fig. 3 



L 
O 
C 
A 
L 

T 
I 

M 
E 





P10 



SOURCE 
PORT : 3167 



DESTINATION 
PORT: 80 



SEQUENCE NUMBER: 50 



ACKNOWLEDGEMENT No. 



P20 



SOURCE 
PORT : 80 



DESTINATION 
PORT: 3617 



SEQUENCE NUMBER: 51 



ACKNOWLEDGEMENT 
NUMBER: 200 



SOURCE 
PORT : 3167 



DESTINATION 
PORT: 80 



SEQUENCE NUMBER: 



ACKNOWLEDGEMENT 
NUMBER: 201 




L 

O 
C 
A 
L 

T 
I 

M 
E 



WEB 
BROWSER 





H 






T 






T 






P 




TCP 


IP 




N/W CARD 




DRIVER 



N/W CARD 



SERVER ENTITY 




Fig. 4 



CD 



CO 



E 

E 
E 



CM 

a> _ 

is 

— m 



CD 



. CD 

? UJ ^ 
C ^ CO 

CO cm 



<D 

Q> CM 

Q- Tt 
CO 

x is 



o 



_K 



^ co 
a) 

= -o "O 



a> a> X, — 

to co | — r-^vr 

co co ^ 
a> cd 

1— »w ' 



% 

a 
c 

o 

-*— » 

CD 



CD 
Q_ 



CD 



< 

CD 



LU 


a 


o 




CM 


CO 












o 


X 






— > 










< 


CD 








< 



Q 


O 


CD 




CM 


CO 



CM 
CD 



o 


CO 


< 




CM 


CO 



o 

CD 



^5 



LU 



E 



LU 



E 



< 

u_ 

UJ 

a 



CO 
CO 



O 

CD 



UPDATE 
DISPATCH 
RECORD 




704 



AD D TO ' 
V/BUFFER 



712 



END 



710 



712 



710 



Fig. 7 



▼ 



CLOCK 
TIMEOUT 



CLOCK 
TIMEOUT 



804 




808- 



810 



812 



814- 



DELETE 
RQ1 



DELETE 
IDENTICAL 
RQnS 



UPDATE 
DISPATCH 
RECORD 



806 



802 



803 



N 





842 



VIRUS 
ALERT 



846 



814 



N 



844 




INFECTION 
INCREASING 
ALERT 



Fig. 8B 



Fig. 8A 



